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(54) Vectors and use thereof for capturing target genes 

(57) The invention relates to secretory gene trap 
vectors and methods of using such vectors to isolate 
extracellular proteins and to make cells and organisms 
with mutant secretory genes. The vectors encode a type 
II transmembrane domain and a lumen -sensitive indi- 
cator marker and optionally, a selectable marker and an 
exon-splice acceptor site. The gene isolation methods 
involve stably introducing the secretory trap vectors into 
an endogenous gene whereby the expression of the 
resultant fusion protein provides a differential expres- 
sion of the indicator marker depending on whether the 
endogenous gene provided an N-terminal signal 
sequence. 



CD 



CO 

o 

CL 
LU 



EP 0 731 1 69 A1 



10 



15 



20 



Description 

proteins. 
R AC KG RO UN D 

Secrete, proteins are genera., but not ^^^^Z^^^^^ 
•signal sequence') of rough, 18 to 25 hyd rophob. J^^tcSSmSbranefor export from the cel.. For the 
to the secretory pathway such that the polyp ^fc^Lde during the secretion process whereby the 

rjss &««2S^*^ — for "~ po,ypeptide hormones 

^S^ne-spanning proteins general, ^ ^^^^^^ 
sequences, often of similar size, « eam *f D ^ 

translocation of the polypeptide, so resulting in ^^^J^^J^ the protein is oriented towards the 
proteins in this class are all oriented .n a type I one ntat .orr ^ * and receptors for cyto kines. There also 
QUt 5 ide of the cel. and include, for sequence artf these proteins may 

exist a minor Cass of membrane ^^.^T^T^^m^ P«*!ns inserted in the membrane such 
exist in either a type I or type II or.entat.on (H-gh, 1 992 . Type I me P determined by the charge 

that the N-terminus remains in the cytosol. The onentaton thrfthese Mprotens a op , y 
differential across the interna, ^ membran * dom ^^ years to identify secreted and membrane 

A variety of expression cloning cDNAs into expression vectors and 

spanning proteins (Simmons. 1993). These ^"^^^^J^ on the surface of cells. In a recent embodi- 
25 screening transfected cells for the appearance ^ ant,gen,c det e j™arrts on encoding an N . termina | signal 

ment of this technique, the "signal sequence trap (Tash.ro , * al 1993) . cDN^ g Qn ^ sMe of tran . 

sequence were identrtied by assaying for the '^^^^^Si membrane-spanning proteins are tech- 
sSntly transfected cells. E*P^ ion f loning 4 f^ species. Moreover, the function and 

nically demanding and generally favour the * 

expression profile of genes isolated by these «™^^^SJ^ons in genes of eukaryotic cells (Gos- 
-Gene tracoinq" has been developed to generate random /nsertorai mu . 

s,er et al. 1989.' Brenner et al. 1989. Kerr et al. ™^ M 'transcription units so as to form 

structural components which faalitate isolator . of ^ recombinant sequences to be identified and/or 

recombinant sequences and other elements which a "™ ^"^""^^a wnich a re commonly associated w.th 
characterised. Thus for example, the known J^^^JJ^^at th. 5' end of al. exons and pol- 
ykaryotic structural genes such as for example, splice acc ptor s te^,c ^ ^ q .^^.^ tne 
denylation sites which normal, follow the final exoa ' *• ^ ^ NA transcript that contains a portion of the 
splice acceptor and po.yadenylatian s,tes ^^^^Hh si milar?unction do not re, on splicing, 
target gene spliced to reporter gene sequences of he vector ■ OJw *cto recomblnational events 
but instead recombine within cod.ng sequences of the ^fj^^^^^ represents insertions that d.srupt 
Each insertion event that activates expression of the re P or \ er 9® n ^ ne ^ ' re expression of the reporter gene is 
the norma, coding sequences of the target ^J^%^^ reflect the expression pattern 
under the regulatory control of the target gene and th * r ^'f J^" 8 contained in the RNA fusion transcript may 

enzyme that lacks a signal sequence ^^"^^^Iclo^te large N terminal fusions without 
the expression of gene fusions in bactena .due to the fact p-ga al , 9 or porti0 ns of secreted mole- 

affecting its enzyme activity (Casadaban, Chou & Coher 1980). Fusions ^c 9 Ha|| & 

cules has been used to define the requirement for the germinal £ J^que^ frQm the ^ suggesting that p-gal 

' <E ™ - -■■ 19841 ,hM 
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examples, p-gal activity is preserved. In contrast, Caenorhabditis elegans, p-gal activity is lost in a fusion that contained 
the N-terminal signal sequence of a secreted laminin (Fire. Harrison and Dixon, 1990). Including a predicted type I 
(Hartman et al. 1989) transmembrane domain between the signal sequence and p-gal, restored enzymatic activity to 
the fusion protein presumably by keeping p-gal in the cytosol. 
5 We have now developed a strategy which solves the problems outlined above and which in its more specific 

aspects m based on gene trap protocols. A modified gene trap vector, the secretory trap, was engineered such that the 
activity of the p-gal reporter gene is dependent on the acquisition of a signal sequence from the endogenous gene at 
the site of vector insertion. Fusions that do not contain a signal sequence fail to activate reporters cannot be ascer- 
tained without considerable additional effort. 

SUMMARY OF THE INVENTION 

Methods and compositions for detecting and/or isolating targeted genes are provided. According to a first aspect 
of the present invention, there are provided vectors comprising a component which upon insertion into a target eukary- 

15 otic gene produces a modified gene which on expression codes for a polypeptide having a first portion of its amino acid 
sequence encoded by a nucleic acid sequence of the eukaryotic gene and a second portion of its amino acid sequence 
encoded by a nucleic acid sequence of the vector, characterised in that the vector includes a sequence which confers 
on the polypeptide a property which is differentially associated with the presence in eukaryotic gene of a nucleic acid 
sequence coding for an amino acid sequence which results in the said polypeptide being located in a predetermined 

20 spatial relationship with structural components of the host cell. 

A particularly useful class of vectors according to the invention are ones wherein the vector includes one or more 
sequences which confer or confers on the polypeptide a property which is differentially associated with the presence in 
eukaryotic gene of a signal sequence associated with a secreted or membrane-spanning protein. By being "differen- 
tially associated" with the presence in the target eukaryotic gene of a nucleic acid sequence coding for an amino acid 

25 sequence which results in the chimeric polypeptide being located in a predetermined spatial relationship with structural 
components of the host cell, the presence or absence of the differentially associated property allows vectors of the 
invention to distinguish between (1) target eukaryotic genes possessing a nucleic acid sequence coding for an amino 
acid sequence which results in the chimeric polypeptide being located in a predetermined spatial relationship with 
structural components of the host cell and (2) target eukaryotic gene which are devoid of a nucleic acid sequence cod- 

30 ing for an amino acid sequence which results in the chimeric polypeptide being located in a predetermined spatial rela- 
tionship with structural components of the host cell. In preferred embodiments of the invention, the vectors of the 
invention can distinguish between target eukaryotic genes which (1) code for proteins which possess a signal 
sequence, e.g. secreted proteins and ones which (2) code for proteins which do not possess a signal sequence, e.g. 
non-secreted proteins. 

35 The aforementioned "conferring" sequences can, for example comprise at least a portion of a membrane-associ- 

ated protein. In this embodiment, the protein product encoded by a reporter gene element of the vector can be forced 
to adopt, on integration into a target gene, one of two configurations, depending on whether or not gene includes a sig- 
nal sequence associated witr^a secreted or membrane-spanning protein. Thus for example, if gene does code for a 
secreted or membrane-spanning protein the membrane-associated protein element of the vector sequence can cause 

40 a reporter gene product to adopt a configuration in relation to cell components such that the reporter gene product is 
activated and produces a detectable signal. Alternatively if the vector is incorporated into target gene which does not 
code for a secreted or membrane-spanning protein, the membrane-associated protein element of the vector sequence 
will cause reporter gene product to adopt a configuration in relation to cell components such that the reporter gene 
product is not activated and consequently will not produce a detectable signal. 

45 The membrane protein associated protein element of the vector sequence is preferably a type II transmembrane 

domain, i.e. a domain which includes a membrane-spanning sequence and any necessary flanking sequences (see 
below). Thus preferred vectors include a reporter gene and a sequence encoding a type II transmembrane domain, 
preferably piaced N-terminally to the reporter, each mutually arranged so that on expression of the modified gene, 
detection of reporter polypeptide is dependent upon the eukaryotic gene coding for a secreted protein having a signal 

so sequence, and the reporter is substantially undetectable if the eukaryotic gene codes for a non-secreted protein. The 
reporter generally provides a characteristic phenotype, e.g. an enzymic activity such as p-galactosidase activity. 

The vectors of the invention preferably include a nucleic acid sequence which facilitates insertion of said compo- 
nent into eukaryotic gene. These sequences may for example be (a) sequences associated with elimination of intron 
sequences from mRNA, such as, for example splice acceptor sequences, or (b) polyadenylation signal sequences. 

55 Alternatively, the vector may lack a splice acceptor sequence and thus rely on insertions directly into the coding 
sequences of genes. 

The subject vectors may also include an element allowing selection and/or identification of cells transformed as a 
result of components of the vector having been inserted into the target eukaryotic gene. Such a selectable element con- 
veys a second property on transformed cells which may be independent of the differentially associated property; for 
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example a property allow.ng selection of cells wherein components of the vector have been ,nse r ed mto the target 
eukaryot c -gene as a result of conferring antibiotic resistance or the ability to surv.ve and/or mult.ply on a def.ned 
21 Examples of such marker sequences are ones which result in transformed cells being res.stant to an ant,bi- 
T foTexamTe G418 or having a varied degree of dependence on a growth factor or nutrient. Vectors possessmg 
in the chimeric polypeptide possessing both the differentially associat^i ^ya^j .st.nct 
selectable orooerty are especially preferred, though the two propert.es can result from the same element (e.g. a 
e lectab e maTer which is differentially active according to a predetermined spatial ^^^^22 
nents of the cell) An example of the former vectors are ones possessing sequences confernng both p-galactosidase 
?p!S) art n2nydn («kS phosphotransferase activities on the chimeric protein. The construct pgeo combing 
l0 feq'uencet coSng both p-ga,acLidase (p-ga» and neomycin (neo) phosphotransferase acfcv.t.es ,n a s.ngle con- 
struct is particularly preferred. „ ( ^c ( h at Hnn n t 
As discussed, the selective inactivation of the differentially associated property .n ch.menc polypeptides that do not 
acauire a signal sequence of an endogenous gene depends on the insertion of the chimenc polypept.de .n a type I on- 
3^n^nSS«ne of the ER. Suitable type II transmembrane domains are preferably identrf.ed empncaNy as 
I5 desc M Mow In addition, the orientation of proteins that contain interna, transmembrane doma.ns (s.gnal anchor 
Senfes) tutno signa. sequence may frequency be predicted from the number of positwe 
wThin 15 amino acids either side of the transmembrane domain (Hartmann, Rappaport & Lod.sh. 1 989). However pro 
M w th a ^SSStype I orientation may be forced into a type 1 1 orientat.on ,f the N-terminus conta.ns ^y pwMy 
cha^ am^adds Such orientation dispositive flanking sequences are readily identified, as shown w.th CD4, belov* 
20 S cases *t is necessary to retain these dispositive flanKng sequences to preserve the type .1 charac er of the 
domain S these guidelines, transmembrane domains from any of the known type I. proteins may be selected for 
Sig ned secretory trap vectors; suitable transmembrane domains include those from type II proteins ^ sted by 
Har mann ^PPaport & Lodish (1989). for examples, transmembrane domains of human P-glycoprote.n. of human 
%^£%ZZ£ of rat Golgi sialy.transferase. Alternatively, synthetic or hybrid type II transmembrane doma.ns 
25 may be used^ ^ ^ .^.^ g tgrget eukaryotjc gene ^ a protein 

whi J is locaS in , a predetermined spatial relationship with structural components of the host cell which composes 
Ja^rrETc* utiLng a vector as defined above and detecting the transformed cel. by assaying for sa,d property 
Ms TffereSly -associated with the presence in the target eukaryotic gene of a nucleic ac.d sequence coding for 
so I am no aSsZenct which results in said polypeptide being located in a predetermined spataal relat.onsh.p w.th 
30 a r.. a . m ' n ° T^^ZL.i rM\ in a preferred embodiment, the method is a method for isolating a target eukary- 
o^geTe ^co^ng'a; ^aceliular protein, comprising steps: ( Introducing into a plurality of cells a ved 
^e I? transmembrane domain and a lumen-sensitive indicator marker, Mm ™ d ^ aX ™ 

ative to said type II transmembrane domain, whereby said vector stably .ntegrates mto the genomes o said plu alrty of 
35 ceL to Sm a^fu a ty of transgenic cells, wherein at least cel. of said plurality of cells, said vector stable integrates .n o 
a qene ending an extracellular protein having an N-terminal signal sequence; (2) incubating said plurality of cells 
under condifcns wherein said indicator marker is expressed in a preferentially active form as a fusion .prote.n .an 
N Terminal eoTon of said extracellular protein in said cell or a descendent of said cell, and ,s unexpressed or expressed 
Ta o7e e enSy Active form in Said plurality of cells not expressing said indicator marker as a fusion protein ^t , an 

4 fl N f^Tnal region of an extracellular protein having an N-terminal signal sequence; (3) detecting the expression of 
^S£££^^<** or a'descendent of said cell; and. (4) isolating from said cell or a descendent of sa.d 
roil a nucleic acid encoding least an N-terminal region of said extracellular protein. 

"e^SiTof^ methods comprise a lumen-sensitive marker, i.e. a marker which is preferentially detectable 
whelnot ^present in a secretory lumen o?a cell, e.g. the ER, Golgi. secretory vesicles, etc. For examp e. the marker may 

« ^ an enzyme such as ga.actosidase which is preferentially inactivated in the lumen. An equ.valent way of practising 
he m/en io:t s S to use amarker which is preferentially detectable when present in tN amende = ant fea tu e 
that the marker is differently detectable depending upon if it is present in or outs.de the lumen. When the marker js 
^SX^L^b the lumen, the vectors also comprise a type I. transmembrane domain « djscnbsd 
a^TaddSon. the vectors may comprise a selectable marker which may be the same or d.fferent from the lumen- 

50 ""IT Je^t rnaVE inSuSTL the ce„s by any convenient means. For example with cells in culture, conven- 
tional techntues such transfection (e.g. lipofection. precipitation, electroporation, etc.), m,cro,n )e ct.on, etc. may be 
SiS S ceJs within an organism, introduction may be mediated by virus, liposome, or any other convenient techn.que 
A wide S of cellsmay be targeted by the subject secretory trap vectors, including stem <**£"^"£ 
55 such , aTzvqotes embryos. ES cells, other stem cells such as lymphoid and myeloid stem cells, neural stem cells, trans- 
formed celfs ^« tumour cells, infected cells, differentiated cells, etc. Tne cells may be targeted ,n culture or ,r i vivo. 

Se vector stably integrates into the genome (i.e. chromatin) of the target cells. Typically, the vector integrates ran- 
domW into ne glnoL of a plurality of the cells, though in at least one of the cells, the vector integrates into a gene 
eSi^tSogenous extracellular (i.e. secreted or transmembrane protein) having an N-term.na. signal sequence 
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such that the signal sequence is oriented 5' to the vector/insert. Such cell acquires then a mutated allele of the extra- 
cellular gene comprising at least a portion of the subject vector encoding the lumen sensitive marker and the type II 
transmembrane domain. 

The cells comprising the stably introduced vector are incubated under conditions whereby the lumen sensitive 
5 marker is expressed as in a preferentially detectable form as a fusion protein with an N-terminal region of the extracel- 
lular protein, i.e. the fusion protein is preferentially detectable via the marker if the endogenous protein portion includes 
a functional signal sequence. The incubation conditions are largely determined by the cell type and may include mitotic 
growth and differentiation of the originally transfected cells. 

The marker in preferentially detectable form may be detected in any convenient way Frequently, the preferential 
io detectability is provided by a change in a marker signal form or intensity such as a color or optical density change. Cells 
preferentially expressing such a signal presumptively comprise a fusion protein comprising an endogenous signal 
sequence. The nucleic acid encoding such endogenous signal sequence is then isolated from the cell by conventional 
methods, typically by cloning the mutant genomic allele or a transcript thereof. In this way. genes encoding known and 
novel extracellular proteins are obtained. In addition, the subject methods may be modified to obtain a products such 
15 as transgenic animals, cell lines, recombinant secretory proteins, etc , some example of which are described below. 

DESCRIPTION OF FIGURES 

FIG. 1a shows the vectors designed to express CD4/pgeo fusion proteins and summarises the results of transient 
20 transfection experiments. 

FIG. 1b shows the design of gene trap vectors (pSApgeo and pGT1.8geo) and the secretory trap vector 
(pGT1.8TM) and their relative efficiency in stable transfections of ES cells. pSApgeo contains the minimal adenovirus 
type 2 major late splice acceptor (SA; open box, intron; shaded box, exon) and the bovine growth hormone polyade- 
nylation signal. The mutation in neo O present in pSApgeo was corrected by replacement of the Clal(C)/Sph l(S) frag- 
25 ment of pgeo. pGT1.8geo and pGT1.8TM contain the mouse En-2 splice acceptor (Gossler et ai., 1989) and SV40 
polyadenylation signal but lack a translation initiation signal (ATG). The secretory trap vector pGT1.8TM includes the 
0.7 kb Pstf/Ndel fragment of CD4 containing the transmembrane domain inserted in-frame with pgeo in pGT1 .8geo. 

FIG. 1c depicts our model for the selection activation of Pgal in the secretory trap vector. 

FIG. 1d shows the relative efficiency of secretory trap vectors designed to capture each of the three reading frames 
30 and the exon trap design, vectors in each reading frame (pGTItm to 3tm) were constructed by Exolll deletion of all but 
30 bp of En-2 exon sequences followed by the insertion of Bgl II linkers. The exon trap vector (pETtm) was made by 
removing the En-2 splice acceptor from pGTl .8TM. 

DESCRIPTION OF SPECIFIC EMBODIMENTS 

35 

The following experiments and examples are offered by way of illustration and not by way of limitation. 

Modified gene trap vectors (which we have termed "secretory trap") were developed which rely on capturing the N- 
terminal signal sequence of aq. endogenous gene to generate an active p-galactosidase fusion protein. Using the pro- 
totype vector pGT1.8TM (FIG. 1b), insertions were found in the extracellular domains of a novel cadhertn, an unc6- 
40 related lamimn, the sek receptor tyrosine kinase and two receptor-linked protein tyrosine phosphatases, LAR and 
PTPk, thus confining the selective property of the secretory trap vector to detect insertiona) mutations in genes encod- 
ing transmembrane and secreted protein products. 

The secretory trap strategy was developed starting from lacZ-based gene trap vectors (Gossler et at., 1989; Bren- 
ner et ai., 1989; Kerr et ai.. 1989; Friedrich & Soriano, 1991; Skarnes, Auerbach & Joyner, 1992). These vectors can 
45 create N-termmal p-galactosidase (pgal) fusion products which localise to different compartments of the cell, presum- 
ably reflecting the acquisition of endogenous protein sequences that act as sorting signals (Skarnes et ai.. 1992; Burns 
etal., 1994). 

Here, we have exploited the differential sorting of p-gal fusion proteins as a means to capture genes encoding N- 
terminal signal sequences, genes therefore likely to be expressed on the cell surface. 

50 

Materials and Methods 

Vectors. The pgeo reporter was obtained by replacing the Clal (unique in /acZ)/Sphl (unique in neo) fragment of the 
gene trap vector pGT1 .8 with the Clal/Sphl fragment of pSApgeo (Friedrich & Soriano. 1991). pGT1.8 is a derivative of 
55 pGT4.5 (Gossler et at.. 1989) where the 3* En2 sequences were replaced with the 0.2 kb Bcil/BamHI SV40 polyA sig- 
nal. The parental vector pActpgeo contains the 0.5 kb human p-actin promoter (Joyner, Skarnes & Rossant. 1989) 
linked to the pgeo/SV40 polyA cassette. The start of pgeo translation was engineered to contain a Kozak consensus 
sequence with unique Sail and Nrul sites on either side for generating subsequent fusions. Sail sites were placed at 
each end of a Ball fragment containing the entire coding region of the rat CD4 cDNA (Clark et ai.. 1987) A 0.45 kb 
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Sall/Kpnl fragment containing the N-terminus of CD4 or a 1 .4 kb Sall/Ndel fragment containing the entire CD4 coding 
region was cloned into Sall/Nrul digested pActpgeo to generate pActSSpgeo and pActSSTMpgeo. respectively. The 
secretory trap vector pGT1 8TM includes the 0.7 kb Pstl/Ndel fragment of CD4 containing the transmembrane domain 
(TM) inserted in-frame with pgeo in pGT1.8geo. u , M „u n i, c IKAn , , nt 

; ES cell culture. CGR8 ES cells (a feeder-independent cell line derived from strain 1 29/Ola mice by J. N*holas (Mount- 
ford et al 1994) were maintained in Glasgow MEM/BHK12 medium containing 0.23% sodium bicarbonate, IX MEM 
Ssential amfno Taol, 2 mM glutamme. 1 mM pyruvate. 50 uM p-mercaptoethanol. 10% feotal calf serum (Gtobep- 
harm) and 100 units/ml DIA/LIF. Transiently transfected cells were obtained by electroporating 10 ES cells with 100 
Sasmid DMA - n a vo.ume of 0.8 ml PBS using a BioRAd Gene Pulser set at 250 ,F/250 V and cultured for 36 hours 
, 0 on S ed coverskips prior to analysis. To obtain stable cell lines, between 5 x 10' to 10» CGR8 ES cells were elec- 
W tropotSa %X>5) with 150 M linearised plasmid DNA, 5 x 10* cells were plated on 10 cm dishes and colonies 
were selected in 200 ng/ml Geneticin (GibCo). To assay pgal enzyme activity and protein. ES cells were grown on getat- 
r n fzi coTeSps and sLned with X-gal or with polyclonal rabbit a-pga. antiserum and FITC-conjugated donkey a-rabbit 
IgG (Jackson immune Research). To permeabilize membranes, cells were treated w,th 0.5% NP-40 prior to antibody 

15 RN A analysis and RACE cloning. Northern blots and RACE were carried out as previously described (Skarnes et al 
1 992). Several modifications were incorporated into the 5' RACE procedure used previously (Skarnes, Auerbad. & Joy- 
ner 1992) 1) microdialysis (0.025 micron filters, Millipore) was used in place of ethanol precp.tat.ons, 2) nested PCR 
30 cycS each) was carried out using an anchor primer and a primer specrtic to CD4 followed by size i se ^ on aga- 

20 rose gels and a second round of PCR with the anchor and the En-2 256 primer and 3) chromospm 400 columns (Clon- 
tech) were used to size select Xbal/Kpn-digested PCR products prior to cloning. 



Results 

To test if pgal fusions that contain an N-terminal signal sequence could be identified by their ^e 11 " 1 ^**^ 0 "- 
vectors were constructed to express portions of the CD4 type I membrane protein (Clark e^/ 1987) fused ,pja. 
chimeric protein that possesses both pgal and neomycin phosphotransferase acbvrt.es (Fr.edr.ch & Soriano. 1 991) (F g 
1 a) pgeo fused to the signal sequence of CD4 (pActSSpgeo) accumulated in the endoplasmic reticulum (ER) but 
lacked pgal activity. Therefore, translocation of pgeo into the lumen of the ER appeared to abol.sh pgal enzyme func- 

30 Stga?aSty was restored by including the transmembrane domain of CD4 (pActSSTMpgeo) presumably by keep- 

: p , . „ . ' _, wa o acso^fod with the FR and in multiple cytoplasmic inclusions, a pattern only 

^c^'Zts ^e^XaiS wi* ^•conventional gene trap vector probably because insertions down- 
stream of both a signal sequence and transmembrane domain of genes encoding membrane spanning proteins are 
infrequent Therefore, to identify insertions in both secreted and type I membrane proteins our gene trap vector 

35 2£? 8geo was modified to include the transmembrane domain of CD4 upstream of pgeo (Fig. 1 b^ Vectors were l,ne- 
ansed prior to electroporation at either the Sea I (Sc) site in the plasmid backbone (represented by the line) of pSApgeo 
or at the Hind III (H) site at the 5' end of the En-2 intron. The number of G418-res,stant colonies obtained in two elec- 
tropora'on experiments (Expt 1 : 5x10? cells; Expt 2: 10* cells) and the proportion that express 
is indicated on the right. With the secretory trap vector P GT1.8TM pgal enzyme activity ,s restored to any insertion 

electroporation into ES cells. Although pSApgeo contains a start of translation which is absent ,n our vectors fewer 
G^SSsmnt colonies were obtained with pSApgeo than with P GT1 8geo. More importantly, nearly al. the colon.es 
derived with pSApgeo showed high levels of pgal activity, whereas our vector showed a broad range of staining inten- 
ds sitieslnd a greater proportion of pgal negative colonies. Sequence analysis of the pSApgeo vector revealed a point 
mutation in neo known to reduce its enzyme activity (Yenofsky, Fine & Fallow. 1991). Therefore, the pSApgeo vector 
appears to pre-select for genes expressed at high levels and correction of the neo mutation ,n our vectors now allows 

us to access genes expressed at low levels (see below). 

Approximately half of the pGT1 ,8geo colonies express detectable pgal activity and show various subcellular pat- 
so terns of pgal staining observed previously. In contrast, only 20% of the pGT1 8TM colonies express pgal a^arW 
display the "secretory" pattern of pgeo activity characteristic of the pActSSTMpgeo fusion. Stable cell lines transfected 
with P GT1 8TM in most cases showed detectable pgal activity in undifferentiated ES cells however, we occasions l y 
found ES cell lines that exhibited detectable pgal activity only in a subset of differentiated eel types. The reduction in 
the proportion of pgal-positive colonies and the singular pattern of pgal staining observed with the secretory trap vector 
5s suggest^ that pgal activity is retained only in fusions that contain an N-terminal signal sequence and that pgal activity, 
but not neo activity, is lost in fusions with proteins that do not possess a signal sequence. 

Our dataTndicate that in the absence of cleavab.e N-terminal signal sequence, the fus.on protein behaves as a type 
II membrane protein (High. 1992). placing pgeo in the ER lumen where the pgal enzyme .s inactive (FIG. 1 c) Jo con- 
firm t™. several pgal-negat.ve cell lines were isolated and analysed by immunofluorescence, pgal-negative cells lines 
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were identrfied from immunodotblots of whole cell lysates using a-pgal antibodies and the ECL detection system (Amer- 
sham). From a screen of 48 colonies, three Pgai-negative cell lines were recovered and analysed by immunofluores- 
cence. In these lines, the fusion protein was detected on the surface of ceils in the absence of detergent 
permeabilization, indicating a type II orientation of the Pgeo fusion protein. In contrast, detergent permeabilization was 

5 essential to detect the fusion protein in pgal-positive cell lines, as would be expected for type I membrane proteins. 

A model for the observed selective activation of pgal in the secretory trap vector is presented in FIG. 1 c. Insertion 
of pGTl .8TM (hatched box) in genes that contain a signal sequence produce fusion proteins that are inserted in the 
membrane of the endoplasmic reticulum in a type I configuration. The transmembrane domain of the vector retains pgal 
tn the cytosol where it remains active. Insertion of the vector in genes that lack a signal sequence produce fusion pro- 
re? teins with an internal TM domain. In these fusions, the transmembrane domain acts as a signal anchor sequence (High, 
1992) to place Pgeo in a type II orientation, exposing pgeo to the lumen of the ER where pgal activity is lost. This 
dependence of enzyme activity on acquiring an endogenous signal sequence provides a simple screen for insertions 
into genes that encode N-terminal signal sequences. Further proof for this model has come from cloning several genes 
associated with several secretory trap insertions. 

is 5' RACE (rapid amplification of cDNA ends) was used to clone a portion of the endogenous gene associated with 

secretory trap insertions that express detectable Pgal activity (Table 1 ). Northern and RNA dot blot analysis showed that 
approximately one half (5 of 1 1 analyzed in this study) of the G418-resistant cell lines fail to properly utilize the splice 
acceptor and produce fusion transcripts that hybridize with intron sequences of the vector. These insertions presumably 
do not represent true gene trap events and thus were not analyzed further. Northern blot analysis of six properly-spliced 

20 lines detected a unique-sized pgal fusion transcript in each cell line. For these experiments, a Northern blot of 1 5 ug ES 
cell RNA was hybridised with lacZ gene and reprobed with a RACE cDNA fragment cloned from the ST534 (LAR) inser- 
tion. At least two independent RACE cDNAs were cloned from each cell line. The cDNAs obtained from all cell lines 
except ST514 detected both the fusion transcript and an endogenous transcript common to all cell lines as shown for 
the ST534 probe. The ST514 insertion illustrates that genes expressed a very low levels in ES cells can be trapped. In 

25 ST514 cultures, Pgal activity was observed only in a few differentiated cells and accordingly neither the fusion nor the 
endogenous transcripts could be detected on Northern blots. 

Sequence analysis of the RACE cDNAs in all cases showed the proper use of the splice acceptor and a single open 
reading frame in-frame with pgeo. One insertion occurred in netrin, a secreted laminin homologous to the unc-6 gene 
of C. elegans (Ishii et al. 1992) recently cloned in the chick (Serafini et al. 1994). The remaining five insertions inter- 

30 rupted the extracellular domains of membrane spanning proteins: a novel cadherin most closely related to the fat 
tumour suppressor gene of Drosophila (Mahoney et al. 1991), the sek receptor tyrosine kinase (Gilardi-Hebenstreit et 
al. 1992), the receptor- 1 inked protein tyrosine phosphatase PTPk (Jiang et al. 1993), and two independent insertions in 
a second receptor-linked tyrosine phosphatase LAR (Streuli et al. 1988). These results support the prediction that pgal 
activity is dependent on acquiring an N-terminal signal sequence from the endogenous gene at the site of insertion. 

35 The pattern of pgal expression in embryos derived from insertions in the sek (ST497) and netrin-1 (ST514) genes 

was very similar to published RNA in situ results for the mouse sek (Nieto et al. , 1 992) and chick netrin (Kennedy et al. . 
1994) genes, providing further proof that gene trap vectors accurately report the pattern of endogenous gene expres- 
sion (Skarnes, Auerbach & Joyjier, 1992). For these experiments, chimeric embryos and germline mice were generated 
by injection of C57BI/6 blastocysts (Skarnes, Auerbach & Joyner, 1992). Embryos at the appropriate stages were dis- 

40 sected, fixed and stained with X-gal asdescribed (Beddington etai, 1989). Both insertions in LAR (ST484, 534) exhib- 
ited weak, widespread expression in 8.5d embryos. The insertion in PTPk (ST531) showed pgal expression in 
endoderm and paraxial mesoderm, highest in newly condensing somites, pgal expression in tissues of adult mice car- 
rying insertions in LAR and PTPk correlated well with known sites of mRNA expression (Jiang et al., 1993; Longo et 
al., 1993). Highest levels of pgal activity were found in the lung, mammary gland and brain of ST534 (LAR) mice and in 

45 the kidney, brain and liver of ST531 (PTPk) mice. 

ES cell lines containing insertions in the LAR, PTPk, and sek genes have been transmitted to the germiine of mice. 
Following germline transmission of the PTPk and U\R insertions, breeding analysis showed that mice homozygous for 
either insertion are viable and fertile. To confirm that the LAR and PTPk genes were effectively disrupted, Northern 
blots of RNA from wild-type and homozygous adult tissues were probed with cDNAs from regions downstream of each 

50 insertion site. In Northern blots of 10 ug RNA from wild-type (+/+), heterozygous (+/-) and homozygous (-/-) lung of 
ST534 (LAR) and kidney of ST531 (PTPk) adult mice were hybridized with LAR and PTPk cDNA sequences 3' to the 
insertion and reprobed with the ribosomal S12 gene as a loading control. For both mutations, normal full-length tran- 
scripts were not detected in homozygous animals. 

Because secretory trap insertions generate fusions that in some cases will contain a large portion of the extracel- 

55 lular domain of the target gene, the production of both loss of function and gain of function (i.e., dominant-negative) 
mutations are possible. However, since the Pgeo fusions with LAR and PTPk include less than 300 amino acids of the 
extracellular domains of these proteins, these insertions likely represent null mutations. LAR and PTPk are members 
of an ever-increasing family of receptor PTP genes (Saito. 1993). The absence of overt phenotypes in LAR and PTPk 
mutant mice is likely due to functional overlap between gene family members, as has been observed with targeted 
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10 



15 



20 



mutations in multiple members of the myogenic and Src-family genes (Rudnicki ef a/., 1993; Stein, Vogel & Soriano, 
1994 Lowell, Soriano & Varmus, 1994). 

Based on the first six genes identified, the secretory trap shows a preference for large membrane-spanning recep- 
tors The recovery of two independent insertions in LAR further suggests that the current vector des.gn will access a 
Silted clasTol I genee The requirement for gene trap vectors to insert in introns of genes is predicted to .mpose an 
nhl Tenf oS^avour of deteSng genes composed of large intronic regions and consequently Hm.t the number of 
oenes accessible w.th this approach. To access a larger pool of genes, we have constructed vectors in each of the three 
VSS^SS^ Furthermore, to recover insertions in smaNer transcription units composed of few or no introns. 
S^SSlSS^ISJtMp- version of the vector of that lacks a splice acceptor. The relative eff ic.ences of secretory 
"ap v^oTslngineTed in all three reading frames and the exon trap vector are given in FIG. Id. Electrograms o 
CGRsts ceMs weS carried out as described above (Expt 1:2x10* cells; Expt 2: 10 8 cells). Each vector yielded «mriar 
nlbefs of G478 resistant colonies, a similar proportion of which exhibit the secretory pattern o pgal acbvriy JJtth ^a 
combination of vectors, one obtains a more representative sampling of the genome that should mclude both membrane 

""S w" bel^SlS in this invention we have shown that the P geo reporter gene can be modtted to contain 
an N ^ m^aUransmembrane domain. Integration into an endogenous gene encoding an N-termma. signa. sequence 
oroctcl^fus on pTotein that assumes a type I configuration, keeping pgal in the cytosol where ,t retains functional 
J^aSSfriX 1 c 0». Conversely, if the modified reporter integrates into a gene that does not encode a sig- 
nal sequence ^ hydrophobic ransmembrane domain itself is now recognised by the cell as a signal anchor sequence 
to oS the fusion protean in a type .I orientation whereupon the pgal enzyme is inactivated (see F,g. 1 c (.)). Therefore^ 
a construct in which pgal or pgeo is prefixed by an N-terminal type II transmembrane doma.n has a unique property K 
^^in^at« into a secretory' gene encoding a signa. sequence, the pgal remains active. If it * 
a non sSory gene, pgal activity is b.ocked. This permits integrations into secretory genes to be identrfied by a simple 
assay (e.g. color change) for reporter gene activity. 



25 



TABLE 1 



30 



35 



40 



45 



50 



Identification of the endogenous gene associated with six secretory trap insertions 



cell line 



484 
497 
514 
519 
531 
534 
i 



pgal 
expression 1 



transcript size {UbY 



ES 



dift 



+ 

+/- 
+/- 

+ 



fusion 



7.5 
6.5 
ND 
>12 
6.1 
6.0 



endogenous 



7.5 
7 

ND 
>12 
5.3 
7.5 



gene J 



LAR (604) 
sek (439) 

netrin (404) 
novel cadherin 

PTPk(288) 
LAR (228) 



phenotype 4 
(wt.hethom) 



NA 

? 
? 

NA 

viable (36:57:27) 
viable (36:79:25) 



'based on X-gal staining of ES cell cultures that conta.n a subset of spontaneously differenti- 
ated (diff) cell types, (+/-) indicates expression in a subset of different.ated ceil types. 
2 transcnpt sizes were determined from Northern blots (Fig 4 and data not shown). N.D., not 
detected 

lumbers in parentheses indicate the insertion site within the endogenous based on t^ amino 
acrt sequence of rat LAR (AC L1 1586), mouse sek (AC S5t422), chick netrin 1 (AC L34549), 
and mouse PTPk (AC L10106). The Genbank accession number for the novel cadherin .s (to 

be submitted). , t 

4 based on the recovery of homozyogous animals at weaning age in litters from heterozygous 
intercrosses. (?) phenotype unknown, breed.ng in progress. NA, not applicable, msert.on not 
yet in germline. 
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40 

Claims 

1 . A vector comprising 

a component which upon insertion into a target eukaryotic gene produces a modified gene which on expres- 
45 sion codes for a polypeptide having (1) a portion of its amino acid sequence encoded by a nucleic acid sequence 

of the target eukaryotic gene and (2) a portion of its amino acid sequence encoded by a nucleic acid sequence of 
the vector, 

characterised in that the vector includes one or more sequences which confers on the polypeptide a property which 
is differentially associated with the presence in the target eukaryotic gene of a nucleic acid sequence coding for an 
so amino acid sequence which results in said polypeptide being located in a predetermined spatial relationship with 

structural components of the host cell. 

2. A vector according to Claim 1 comprising a nucleic acid sequence which facilitates detection of said component in 
the target eukaryotic gene, one or more sequences which confer or confers on the polypeptide a property which is 

55 differentially associated with the presence in the target enkaryotic gene or a signal sequence associated with a 

secreted or membrane-spanning protein, and a nucleic acid marker sequence allowing selection and/or identifica- 
tion of cells transformed as a result of components of the vector having been inserted into the target eukaryotic 
gene. 
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A vector according to claim 1 comprising a reporter gene which includes one or more sequences which confer or 
confers on the polypeptide a property which is differentially associated with the presence in the target eukaryotic 
gene of a signal sequence associated with a secreted or membrane-spanning protein, and wherein the protein 
product encoded by a reporter gene element of the vector can be forced to adopt, on integration into a target gene, 
one of two configurations, depending on whether or not the target gene includes a signal sequence associated with 
a secreted or membrane-spanning protein, wherein said amino acid sequence which results in said polypeptide 
being located in a predetermined spatial relationship with structural components of the host cell is a signal 
sequence and the predetermined spatial relationship consists of secretion from the cell. 

A method for isolating a target eukaryotic gene encoding an extracellular protein, said method comprising steps: 

(1 ) introducing into a plurality of cells a vector encoding a type II transmembrane domain and a lumen-sensitive 
indicator marker, wherein said indicator marker is oriented 3' relative to said type II transmembrane domain, 
whereby said vector stably integrates into the genomes of said plurality of cells to form a plurality of transgenic 

75 cells, wherein in at least one cell of said plurality of cells, said vector stably integrates into a gene encoding an 

extracellular protein having an N-terminal signal sequence; 

(2) incubating said plurality of cells under conditions wherein said indicator marker is expressed in a preferen- 
tially active form as a fusion protein with an N-terminal region of said extracellular protein in said cell or a 
descendent of said cell, and is unexpressed or expressed in a preferentially inactive form in said plurality of 

20 cells not expressing said indicator marker as a fusion protein with an N-terminal region of an extracellular pro- 

tein having an N-terminal signal sequence; 

(3) detecting the expression of active indicator marker at said cell or a descendent of said cell; 

(4) isolating from said cell or a decendent of said cell a nucleic acid encoding least an N-terminal region of said 
extracellular protein. 

25 

5. A method of according to claim 4, wherein said vector further encodes a selectable marker. 

6. A method according to claim 4, wherein said cells are embryonic stem cells. 

30 7. A method according to claim 4, wherein said preferentially active form is a detectable amount of a catalytic activity 
and said preferentially inactive form is an indetectable amount of said catalytic activity. 

8. A method for making a transgenic cell comprising a mutation in a gene encoding an extracellular protein, said 
method comprising steps: 

35 

(1) introducing into a plurality of cells a vector encoding a type II transmembrane domain and a lumen-sensitive 
indicator marker, wherein said indicator marker is oriented 3' relative to said type II transmembrane domain, 
whereby said vector stably integrates into the genomes of said plurality of cells to form a plurality of transgenic 
cells, wherein in at least one cell of said plurality of cells, said vector stably integrates into a gene encoding an 

40 extracellular protein having an N-terminal signal sequence; 

(2) incubating said plurality of cells or descendents of said piurailty of cells under conditions wherein said indi- 
cator marker is expressed in a preferentially active form as a fusion protein with an N-terminal region of said 
extracellular protein in said cell or a descendent of said cell, and is unexpressed or expressed in a preferentially 
inactive form in said plurality of cells not expressing said indicator marker as a fusion protein with an N-terminal 

45 region of an extracellular protein having an N-terminal signal sequence; 

(3) detecting the expression of active indicator marker at said cell or a descendent of said cell; wherein said 
cell is a transgenic cell comprising a mutation in a gene encoding an extracellular protein. 



50 



9. A method of according to claim 8, wherein said vector further encodes a selectable marker. 

10. A method according to claim 8, wherein said cells are pluripotent cells. 

11. A method according to claim 8, wherein said preferentially active form is a detectable amount of a catalytic activity 
and said preferentially inactive form is an indetectable amount of said catalytic activity. 
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